Dictionary Organization in Linguistic Automaton for Oriental Languages
نویسندگان
چکیده
The central problem for natural language processing (NLP) systems dealing with non-Indo-European (“Oriental”) languages is how to develop automatic dictionaries (AD) and dictionary entry (DE) schemes. The point is that the need of Oriental language industrial NLP has been felt for some time. It has acquired additional urgency with the rapid growth of business contacts between Russia and the nations of the Middle East and the Pacific Rim. The very notions of such language items as root, stem, word form (w/f) and text word (t/w), which are so essential in designing an AD, are quite distinct in each of the Oriental languages and fundamentally different from what we are used to treat as a root, a t/w etc. in the Indo-European languages. If an Oriental language AD is to be integrated into a multimodular linguistic automaton and the system has to retain its basic structure, this project requires development of various forms of sub-lexicon databases. The structure of Arabic and Hebrew t/w requires elaboration of four versions of DE while the differentiation of full and structural words in Chinese provides two versions. An agglutinative word structure model, such as Turkic and Finno-Ugric, requires a tree-structured database and special procedures of access.
منابع مشابه
Deterministic Fuzzy Automaton on Subclasses of Fuzzy Regular ω-Languages
In formal language theory, we are mainly interested in the natural language computational aspects of ω-languages. Therefore in this respect it is convenient to consider fuzzy ω-languages. In this paper, we introduce two subclasses of fuzzy regular ω-languages called fuzzy n-local ω-languages and Buchi fuzzy n-local ω-languages, and give some closure properties for those subclasses. We define a ...
متن کاملAutomatic Dictionary Organization In NLP Systems For Oriental Languages
This paper presents a description of automatic dictionaries (ADs) and dictionary entry (DE) schemes for NLP systems dealing with Oriental languages. The uniformity of the AD organization and of the DE pattern does not prevent the system from taking into account the structural differences of isolating (analytical), agglutinating and internal-flection languages. The "Speech Statistics" (SpSt) pro...
متن کاملTransculturation and Multilingual Lives: Writing between Languages and Cultures
This paper looks at the issues of transculturation as explored in auto and semi-autobiographical accounts of linguistic and cultural transitions. The paper also addresses a number of questions about the structure of these texts, the authors’ linguistic competences, as well as questions about the theoretical and conceptual tool which may help us to discuss the issues the writers are reflecting o...
متن کاملThe 2008 Oriental COCOSDA Book Project: in Commemoration of the First Decade of Sustained Activities in Asia
The purpose of Oriental COCOSDA is to provide the Asian community a platform to exchange ideas, to share information and to discuss regional matters on creation, utilization, dissemination of spoken language corpora of oriental languages and also on the assessment methods of speech recognition/synthesis systems as well as to promote speech research on oriental languages. Since its preparatory m...
متن کاملLanguage Features of Russian Texts of Engineering Discourse
The Article is devoted to the applied problem of identifying the linguistic features of engineering texts. The study of Russian-language texts of engineering discourse is usually of an applied nature, in our case, this applied research is caused by the need to teach foreigners who receive professional engineering education in Russia and in Russian language. The object of the research is the Rus...
متن کامل